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Abstract 

■ In this paper we describe data structures for orthogonal range reporting in external memory 

£Nl | that support fast update operations. The query costs either match the query costs of the best 

£H ■ previously known data structures or differ by a small multiplicative factor. 

3 ' 

O ■ 1 Introduction 

m ' 

In the orthogonal range reporting problem a set of points is stored in a data structure so that for any 
d-dimensional query range Q = [ai, by\ x . . . x [a^, bd] all points that belong to Q can be reported. 
Q , Due to its fundamental nature and its applications, the orthogonal range reporting problem was 

c/J | studied extensively; we refer to e.g. [T01 El H21 El E] for a small selection of important publications. 

In this paper we address the issue of constructing dynamic data structures that support fast update 
operations in the external memory model. 

External memory data structures for orthogonal range reporting also received significant atten- 
tion, see e.g., [HI H7J [L9\ El HU El E5] . We refer to [E] for the definition of the external memory 
model and a survey of previous results. In particular, dynamic data structures for d = 2 dimensions 
are described in |16} [T71 [6]. The best previously known data structure of Arge, Samoladas, and 
Vitter [6] uses 0((N/B) log 2 N/ log 2 log B N) blocks of space, answers queries in 0(log B N + K/B) 
I/Os and supports updates in 0(log B iV(log 2 N/ log 2 log B N)) I/Os; in [6j, the authors also show 
that the space usage of their data structure is optimal. Recently, the first dynamic data structure 
that supports queries in 0{\og 2 B N + K/B) I/Os in d = 3 dimensions was described [I5j . 

All previously described external memory data structures with optimal or almost-optimal query 
cost need f2((log B N\og 2 N)/\og 2 log B N) I/Os to support an insertion or a deletion of a point; see 
Table[H This compares unfavorably with significantly lower update costs that can be achieved by in- 
ternal memory data structures. For instance, the two-dimensional data structure of Mortensen [12] 
supports updates in 0(log 2 iV") time for any constant / > 7/8. Moreover, the update costs of 
previously described external structures contain an 0(log 2 A r ) factor. Since block size B can be 
large, achieving update cost that only depends on \og B N would be desirable. High cost of updates 
is also a drawback of the three-dimensional data structure described in |15]. Reducing the cost 
of update operations can be important in the dynamic scenario when the data structure must be 
updated frequently. 

Our Results. We describe several data structures for orthogonal range reporting queries in 
d = 2 dimensions that achieve lower update costs. We describe two data structures that sup- 
port queries in 0(\og B N + K/B) I/Os. These data structures support updates in 0(log B +£ N) 
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Source 


Query 
Cost 


Update 
Cost 


Space 
Usage 


m 
nzi 

m 

* 

* 
* 


0(log B 7V + -|) 
0(log B iV+ A + i io g (S)) 
0(log B 7V + !) 
0(log B 7V + -|) 
0(log B 7V + -|) 
0(log B iV(log 2 log B iV) 2 + |) 


0(log B Nlog 2 Nlog' 2 B) 
0(log 2 N(log B N + (log| AO/B)) 
(9(log B JV log 2 JV/ log 2 log B TV) 
0(log B + £ 7V)t 
0(log|iV) 
O(log B iV(log 2 log B A0 2 ) 


0((iV/B)log 2 iVlog 2 Slog 2 log 2 B) 

0{(N/B)\og 2 N) 
0{{NlB)\og 2 N/\og 2 \og B N) 

0((N/B)log 2 N) 
0((N/B)\og 2 N/\og 2 \og B N) 

0((N/B)log 2 N) 



Table 1: New data structures and some previous results for d = 2 dimensions. Our results are 
marked with an asterisk; f denotes randomized results. The result in the first row of the table can 
be obtained from the result in [IB] using a standard technique. The function ilog(a;) is the iterated 
log* function: ilog(ar) denotes the number of times we must apply the log* function to x before the 
result becomes < 2, where log* (a?) = min{i j log 2 (a;) < 2}, and loggia;) denotes the log 2 function 
repeated t times. 

I/Os with high probability and in 0(\og 2 B N) deterministic I/Os respectively. Henceforth g 
denotes an arbitrarily small positive constant. We also describe a data structure that uses 
0({N/B) log 2 N) blocks of space, answers queries in 0(log B iV(log 2 log B N) 2 ) I/Os, and supports 
updates in 0(log B N(log 2 log B N) 2 ) I/Os. All our results are listed in Table HJ 

Overview. The situations when the block size B is small and when B is not so small are 
handled separately. If the block size is sufficiently large, B = jQ(log 2 N) for an appropriate choice 
of constant, our construction is based on the bufferization technique. We show that a batch of 
0(B l l A ) queries can be processed with Oi\og B N) I/Os. Hence, we can achieve constant amortized 
update cost for sufficiently large B. In the case when B is small, B = 0(log 2 N), we construct the 
base tree with fan-out log! N or the base tree with constant fan-out. Since B = polylog 2 ( N) , the 
height of the base tree is bounded by 0{\og B N) or 0(log B iVlog 2 log B N). Hence, we can reduce 
a two-dimensional query to a small number of simpler queries. 

In section [2] we describe a data structure that supports three-sided reporting queries in 
0(log B N + f ) I/Os and updates in O(Jy) I/Os if B = 9,(\og B N). Henceforth 5 denotes an 
arbitrary positive constant, such that S < 1/4. In Appendix A, we generalize this result and obtain 
a data structure that supports updates in O(l) I/Os and orthogonal range reporting queries in 
0(log B N + f ) I/Os if B = Q(log% N). Thus if a block size is sufficiently large, there exists a data 
structure with optimal query cost and O(l) amortized update cost. We believe that this result is 
of independent interest. Data structures for B = 0(log 2 iV) are described in sectional 

2 Three-Sided Range Reporting for B = Q(log^ N) 

Three-sided queries are a special case of two-dimensional orthogonal range queries. The range of 
a three-sided query is the product of a closed interval and a half-open interval. In this section 
we assume that the block size B > Ah log B N for a constant h that will be defined later in this 
section. Our data structure answers three-sided queries with 0{\og B N + K/B) I/Os and updates 
are supported in 0(1/ B s ) amortized I/Os. 

Our approach is based on a combination of external priority tree [6] with buffering technique [5] . 
Buffering was previously used to answer searching and reporting problems in one dimension. In this 
section we show that buffering can be applied to three-sided range reporting problem in the case 
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when B = Q,(\og%N). At the be ginning, we describe the external priority tree [6] data structure. 
Then, we show how this data structure can be modified so that a batch of B s updates can be 
processed in constant amortized time. Finally, we describe the procedure for reporting all points 
in a three-sided range Q = [a, b] x [c, +00). 

The following Lemma is important for our construction. 

Lemma 1 A set S of 0(B l+5 ) points can be stored in a data structure that supports three-sided 
reporting queries in 0(K/B) I/Os, where K is the number of points in the answer; this data 
structure can be constructed with 0(B & ) I/Os. 

Proof: We can use the data structure of Lemma 1 from [6]. □ 

External Priority Tree. Leaves of the external priority tree contain the x-coordinates of 
points in sorted order. Every leaf contains 0(B) points and each internal node has Q(B S ) children. 
We assume throughout this section that the height of an external priority tree is bounded by 
h\og B N. The range rng(v) of a node v is the interval bounded by the minimal and the maximal 
coordinates stored in its leaves; we say that a point p belongs to (the range of) a node v if its 
x-coordinate belongs to the range of v. Each node is associated with a set S(v), \S(v)\ = @(B), 
defined as follows. Let L(v) denote the set of all points that belong to the range of v. The set S(v) 
contains B points with largest y-coordinates among all points in L(v) that do not belong to any 
set S(w), where w is an ancestor of v. Thus external priority tree is a modification of the priority 
tree with node degree B°^\ such that each node contains Q(B) points. 

The data structure F(v) contains points from L)S(vi) for all children Vi of v. By Lemma [TJ 
F(v) supports three-sided queries in O(l) I/O operations. Using F(v), we can answer three-sided 
queries in 0(\og B N + K/B) I/Os; the search procedure is described in |6J. 

Supporting Insertions and Deletions. Now we describe a data structure that supports 
both insertions and deletions. We will show below how a batch of inserted or deleted points can 
be processed efficiently. The main idea is to maintain buffers with inserted and deleted points in 
all internal nodes. The buffer D(v), v € T, contains points that are stored in descendants of v and 
must be deleted. The buffer I(v), v G T, contains points that must be inserted into sets S(u) for 
a descendant u of v. A buffer can contain up to B 3S elements. When a buffer I(v) or D(v) is full, 
we flush it into the children Vj of v; all sets I(vj), D(vj), and S(vj) are updated accordingly. 

Definitions of S(v) and F(v) are slightly modified for the dynamic structure. Every set S(v) 
contains at most 2B points. If S(v) contains less than B/2 points than S(vi) = for each child 
Vi of v. The data structure F(v) contains all points from S(vj) U I(vj) for all children Vj of v. 
We store an additional data structure R(v ) in each internal node. R(v ) contains all points from 
US(vi) for all children Vi of v. R(v) can be constructed in 0(B S ) I/Os; we can obtain B 3S points 
with highest y-coordinates stored in R(v) in 0(1) I/Os. Implementation of R(v) is very similar to 
implementation of F(v); details will be given in the full version. 

Suppose that all points from the set V, \V\ = 0(B S ), must be deleted. We remove all points 
from Vr\S(v r ) and Vnl(v r ) from S(v r ) and I(v r ) respectively. We set D(v r ) = D(v r )L)(V\S(v r )). 
When D(v) for an internal node v is full, |-D(w)| = B 3S , we distribute the points of D(v) among the 
children vj of v. Let Dj(v) be the points of D(v) that belong to the range of Vj and update S(vj), 
D(vj) as described above: We remove all points from Dj(v) n S(vj) and Dj(v) fl I(vj) from S(vj) 
and I(vj) respectively. All points of Dj(v) \S(vj) are inserted into D(vj). Finally, we update F(v) 
and R(v). 
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We can insert a batch X of B points using a similar procedure. Initially, all points from 
X are inserted into buffer I(v r ) or S(v r ) and points of X n D(v r ) are removed from D(v r ). Let 
S'(v r ) = S(v r ) Ul and let S"(v r ) be the set of B points with highest y-coordinates in S'(v r ). We 
set S(v r ) = S"(v r ) and I(v r ) = I(v r ) U (S'(v r ) \ S"(v r )). When the buffer I(y) in an internal 
node v is full, \I(v)\ > B 3S , we update the sets S(vj) and I(vj) in the children Vj of v. Let Ij(v) 
be the set of points in I(v) that belong to the range of Vj. Let S'(vj) = S(vj) U Ij(v) and let 
S"(vj) be the set of I? points with the highest y-coordinates in S'(vj). We set S(vj) = S"(vj), 
D( Vj ) = D{ Vj ) \ (D(vj) n Ij{v)), and I{vj) = I(vj) U (S'(vj) \ S"(v,)). The data structures F(v) 
and R(v) are updated accordingly. 

When a buffer is full, we can re-build all I(vj), D(vj), S(vj) and the data struc- 

tures F(v), R(v) in 0(B S ) I/Os. Each inserted point is inserted in 0(log B N) buffers I(v). 
Hence, an amortized cost of re-building secondary data structures caused by an insertion is 
0(B S log B N/B 3S ) = 0(1/ B s ). The cost of a deletion can be analyzed in the same way. 

We also take care that the number of points stored in sets S(u) is not too small. Suppose 
that the number of points in some S(w) is smaller than B/2 when (parent («>)) is emptied. If w 
is a leaf or S(wj) = for all children Wj, we do not need to rebuild S(w). Otherwise, we move 
some points from S(wj) into S(w). Using the data structure R(w), we identify B — \S(w)\ points 
with the highest y-coordinates in UjS(wj). These points are removed from R(w), F(w), S(wj) 
and inserted into S(w). We also update F (parent (u>)) and R (parent (w)). For every child u>j of w, 
we recursively call the same procedure. The total cost of updating all data structures in a node 
is 0(-B 1_3<5 ). Using standard analysis, we can show that maintaining the size of S(w) incurs an 
amortized cost 0(l/B s ). 

Besides that, we should take care that each leaf contains x-coordinates of at most 2B points. 
To maintain this invariant, the external priority tree is implemented as a WBB-tree [?]. The 
branching parameter of our WBB-tree equals to B s and the leaf parameter equals to B. When the 
total number of points stored in all descendants of a node u equals to 2B es ■ B, we split the node u 
into vl and u" . A node on level £ is split at most once after a series of @(B £S ■ B) insertions. When 
a node is split, we assign each element of S(u), I(u), and D(u) to the corresponding set in u' or 
u" . As a result, either S(u') or S(u") may contain less than B/2 elements. In this case, we move 
the points from descendants of u' into v! (from descendants of u" into u") as described above. The 
total amortized cost of splitting a node is 0(1/ B s ). 

Answering Queries. Consider a query Q = [a, b] x [c, +00). Let ir denote the set of all nodes 
that lie on the path from the root to l a or on the path from the root to lb, where l a and are 
the leaves that contain a and b respectively. Then all points inside the range Q are stored in sets 
S(v) or I(v), where the node v belongs to it or v is a descendant of a node that belongs to it. Two 
following facts play crucial role in the reporting procedure. 

Fact 1 Let w be an ancestor of a node v. For any p E S(v) and p' £ S(w), p.y < p' -V- For any 
p € I(v) and p' G S(w), p.y < p'.y. 

Fact 2 Suppose that a point p € S(v) is deleted from S (but p is not deleted from S(v) yet). Then, 
p belongs to a set D(w) for an ancestor w of v. 

We set the value of the constant h so that the height of T does not exceed h\og B N. As follows 
from the Fact [21 the total number of deleted points in S(v) is bounded by h ■ B 3S log B N < B/4. 
Let DEL(v) = ] J w =anc(v)(^ > ( w ) \ Uw'=anc(w)H w '))i where anc(-u) denotes an ancestor of a node u. 
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To wit, DEL(v) is the set of all points p, such that p belongs to some set D(w) for an ancestor w 
of v, but p does not belong to any I(w') for an ancestor w' of w. By Fact [2] all points in S(v) U I(v) 
that are already deleted from the data structure belong to DEL{v). If the set DEL{v) is known, 
then DEL{vi) for a child Vi of v can be constructed in O(l) I/Os. Therefore we can construct 
DEL{v) for all i> G 7r in 0(log B TV) I/Os. 

We can output all points that belong to Q using the following procedure. Let tti be the path 
from l a to the lowest common ancestor v\ of l a and Let 7r 2 be the path from to V[. First, we 
examine all nodes v G tt and report all points p G (S(v) U I(v)) \ DEL(v) that belong to Q; this 
can be done with 0(log B N) I/Os. All other points in S n Q are stored in a set S u where u is a 
descendant of some v G tti U 7T2 or u is a descendant of ^ . 

Consider a node u G 7T2, such that v /= vf, we will show how points in S(u)nQ for all descendants 
u oi v can be reported. Suppose that the child Vi of -u also belongs to 7r 2 and rng(vi-i) = [a', b'\. 
Let Q v = [a, b'] x [c, +00). For a point p stored in a descendant u of u such that u G" 7T2, p belongs 
to Q, if and only if p belongs to Q v . All p £ Q v (1 S(u) are reported as follows. Initially we set 
u = v. We identify all points stored in S(ui)r\Q v or I(itj) C\Q V for some child Uj of u using the data 
structure F(u). Then, we process the resulting list of points and remove all points that belong to 
DEL{u). Finally, we identify all non-leaf children m of u such that at least B/2 points from S{uj) 
are reported. We visit every such m, compute DEL(ui), and recursively call the same procedure 
in Ui. 

Our procedure reports all points in L(v) f) Q v . Suppose that we visited a node u, but the child 
Uj of u was not visited. All points from (S(uj) U I(uj)) PI Q were reported when the node u was 
visited. Since S(uj) contains at least B/2 points, at least one point pj G S(uj) does not belong 
to Q. The x-coordinate of pj belongs to [a, b]; hence, the y-coordinate of pj is smaller than c. By 
Fact[H ^-coordinates of all points stored in <S(z/) VJl{y) for any descendant v of Uj are smaller than 
c. Hence, all points p G S(v) U I{u) are not relevant for our query. 

The search procedure spends 0(1) I/Os in every visited node (ignoring the cost of reporting 
points). Let K v be the total number of reported points in L(v) H Q v . A node u is visited if at 
least -B/4 points from S(u) were reported. Thus we can charge at least B/4 points for every visited 
node. We can conclude that the search procedure spends 0(K v /B) I/Os in the descendants of 
v. Descendants of nodes v G 7Ti, v 7^ vi, and descendants of v\ can be processed with a similar 
procedure. Therefore the total query cost is 0(log B N + K/B). We obtain the following result. 

Lemma 2 Suppose that B s > Ah\og B N for a constant h defined above and some 5 < 1/4. Then 
there exists a data structure that uses 0(N/B) blocks of space and answers three-sided reporting 
queries in 0(log B N + K/B) I/Os. The amortized cost of inserting or deleting a point is 0{1/B S ). 

A similar approach can be used to construct the data structure for general two-dimensional range 
reporting queries. 

Lemma 3 Suppose that B 5 > Ah\ log 2 N for a constant h\ defined in Appendix A and some 
5 < 1/4. Then there is a data structure that uses 0((N/B) log 2 N/ log 2 logg N) blocks of space 
and answers two-dimensional orthogonal range reporting queries in O (log B N + K/B) I/Os. The 
amortized cost of inserting or deleting a point is O(l). 

Our data structure uses the bufferization technique, but some additional ideas are also needed to 
retain the 0(log B N + K/B) query cost and achieve the optimal space usage. We provide the 
details in the Appendix A. 
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3 Two-Dimensional Range Reporting for small B 



It remains to consider the case when the block size B is small. In this section we assume that 
B = 0(log 2 N) and describe several data structures for this case. 

Reduction to Three-Sided Queries. We use the base structure that is similar to structures 
in \17\ [6], We construct the base tree T with fan-out p = 0(log 2 N) on the set of x-coordinates. In 
every node v of T we store the data structures that support three-sided queries [a, +00) x [c, d] and 
(—00, b] x [c, d] . The data structures for three-sided queries are implemented using the external 
priority search tree [6], so that the query and update costs are (9(log B N + K/B) and 0(\og B N). 
In every node v, we also store a data structure that supports the following queries: for any c < d 
and for any 1 < i < j < p, we can report all points p, such that p.y E [c, d] and p is stored in the 
child Vf of v, i < / < j. In [6] the authors describe a linear space data structure that supports 
such queries in 0(p + K/B) I/Os and updates in 0(log B N) I/Os. 

To answer a query [a, b] x [c,d], we identify the lowest node v, such that [a, b] C rng(v). 
Suppose that [a, b] intersects with rng(vi),rng(vi + i), . . . rng(v r ). We answer three-sided queries 
[a, +00) x [c, d] and (—00, b] x [c, d] on data structures for nodes vi and v r respectively. Then, we 
report all points p with c < p.y < d stored in the nodes vi + \, . . . ,v r -±. Since p = 0(log e N) = 
0(\og B N), the total query cost is 0{\og B N + ^). 

Since each point is stored in 0(log 2 N/ log 2 log 2 N) data structures, the space usage of structure 
is O((A7£)log 2 A71og 2 log2A0 = 0((iV/S)log 2 iV/log 2 log i j^) because B = O(logfiV). The 
update cost is 0(log B iV(log 2 N/ log 2 log 2 N)) = 0(\og 2 B N). Combining this result with Lemma O 
we obtain the following Theorem 

Theorem 1 There is a data structure that uses 0{{N/B) log 2 N/ log 2 log B N) blocks of space and 
answers orthogonal range reporting queries in two dimensions in 0(log B N + ^) I/O operations. 
Updates are supported in 0(log B N) amortized I/Os. 

Reduction to One-Dimensional Queries. We can obtain further results by reducing a two- 
dimensional query to a number of one-dimensional queries. We construct a standard range tree with 
constant fan-out on the x-coordinates of points. All points that belong to a range of a node v are 
stored in v. For any interval [a, b], we can find 0(log 2 N) nodes u\ such that p.x E [a, b] if and only 
if p is stored in a node u l . Hence, all points in the query range Q = [a, b] x [c, d] can be reported by 
answering a one-dimensional query Q y = [c, d] in 0(log 2 N) nodes u , . . . , u l of the range tree. Using 
the fractional cascading technique, we can find the predecessor d(u % ) of d and the successor c(u l ) of 
c in all nodes u l in 0(log 2 iV log 2 log 2 N) time; we refer to e.g., [11] for details. When we know c(u*) 
and d{u l ) we can report all elements stored in the node u l in 0(K/B) I/Os. Hence, the query cost is 
0(log 2 N log 2 log 2 N + K/B) = 0(log B iV(log 2 log B N) 2 + K/B). Each point is stored in 0(log 2 N) 
secondary data structures. As described in [TT], the range tree augmented with fractional cascading 
data structures can be updated in 0(log 2 Af(log 2 log 2 N)) = 0(log B N(log 2 log B N) 2 ) time; hence, 
an update requires 0(log B iV(log 2 log B AQ 2 ) I/O operations. 

Theorem 2 There exists a data structure that uses 0((N/B)\og 2 N) blocks of space and answers 
orthogonal range reporting queries in two dimensions in 0(\og B A(log 2 \og B N) 2 + ^) I/O opera- 
tions. Updates are supported in 0(log B A(log 2 log B N) 2 ) amortized I/Os. 

Range Trees with B°^ Fan-Out. Let e' = e/10. If points have integer coordinates, we can 
reduce the query cost by constructing a range tree with fan-out B £ . For every node v and every pair 
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of indexes i < j, where Vi, Vj are the children of v, all points that belong to the children Vi, . . . , Vj of 
v belong to a list Lij(v). A data structure Eij(v) supports one- dimensional one-reporting queries 
on a set of integers. That is, Eij(v) enables us to find for any interval [c,d] some point p G Lij(v) 
such that p.y G [c, d], if such p exists and if all points have integer coordinates. As described in [13], 
we can implement Eij(v) so that queries are supported in O(l) time and updates are supported 
in 0(log £ ) randomized time. Using Eij(y), it is straightforward to report all p G Lij(v) with 
p.y G [c,d] in 0{K/B) I/Os. Consider a query Q = [a,b] x [c, d]. We can find in 0(log B N) I/O 
operations 0(log B N) nodes u and ranges [it,jt], so that the x-coordinate of a point p belongs 
to [a, b] if and only if p is stored in some list Li t j t (u l ). Hence, all points in Q can be reported 
by reporting all points in L,i t j t {u l ) whose y-coordinates belong to [c,d]. The total query cost is 
O (log N/ log log N) = 0(log B N). However, the space usage is 0((N/ B)log\ +8e N) because each 
point is stored in 0(B 2e 'log B N) = 0(log^ +8e ' N) lists L^v). We can reduce the space usage if 
only parts of lists Ly (y) stored explicitly. 

Let L(v) denote the list of all points that belong to a node v sorted by their y-coordinates. We 
divide L{v) into groups of points G s (v), s = 0{\L{v)\/ B l+2e ), so that each G s {v) contains at least 
B l+2e /2 and at most 2B l+2e points. Instead of we store the list Lij(y). The main idea 

of our space saving method is that we need to store points of G s {v) in the list Lij(v) only in the 
case when G s (v) contains a few points from Lij(v). Otherwise all relevant points can be found by 
querying the set G s (v) provided that Lij(v) contains a pointer to G s (v). Points and pointers are 
stored in each list L{j(v) according to the following rules. If \Lij(v) D G s (v)\ < B/2, the list Lij(v) 
contains all points from Lij(v) n G s (v). If Lij(v) n G s (v) > 2B, the list Ly(v) contains a pointer 
ptr s to G s (v). We also store the minimal and maximal y-coordinates of points in Lij(v) Pi G s (v) 
with each pointer to G s (v) from Lij(v). If B/2 < \Lij(v) n G s (v)\ < 2B, Lij(v) contains either a 
pointer to G s (v) or all points from Lij(v) n G s (v). 

Instead of Eij(v), we will use several other auxiliary data structures. A data structure Eij(v) 
contains information about elements of Lij(v). For each point p € Lij{v) we store p.y in Eij(v); 
for every pointer ptr s , Eij{v) contains both the minimal and the maximal y-coordinate associated 
with ptr s . A data structure E(v) contains the y-coordinates of all points in L(v). Both E(v) 
and all Eij(v) support one-reporting queries as described above. A data structure H s (v) supports 
orthogonal range reporting queries on G s (v). Using the data structure described in Lemma 1 
of [B], we can answer three-sided reporting queries in 0(K/B) I/Os using 0(\G s (v)\/B) blocks 
of space. Using the standard approach, we can extend this result to a data structure that uses 
0{{\G s {v)\\og 2 B)/B) blocks and answers queries in 0{K/B) I/Os. 

Now we show how we can report all points p G Lijiv) with p.y G [c,d] without storing L{j(v). 
We can find an element e of Lij(v) with y-coordinate in [c, d\. Suppose that such e is found. Then, 
we traverse the list Lij(v) in +y direction starting at e until a point p with p.y > d or a pointer 
to G s (v) with the minimal y-coordinate larger than d is found. We also traverse Lij(v) in —y 
direction until a point p with p.y < c or a pointer to G s (v) with the maximal y-coordinate smaller 
than c is found. For every pointer in the traversed portion of Lij(v), we visit the corresponding 
group G s (v) and report all points p G G s (v) n Lij(v) with p.y G [c, d]. All relevant points in G s (v) 
can be reported in 0{K s /B) I/Os using the data structure H s (v); here K s denotes the number 
of points reported by H s (v). By definition of Lij(v), a set G s (v) n Lij(v) contains at least B/2 
points if there is a pointer ptr from Lij(v) to G s (v). Unless ptr is the first or the last element in 
the traversed portion of Ly(u), G s (v) contains B points from [a, b] x [c,d]. Since B consecutive 
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elements of the list Lij(v) contain either B points or at least one pointer to a group G s (v), the 
total cost of reporting all points in Lij(v) with p.y £ [c, d] is 0(1 + K/B). 

Now we consider the situation when there is no e £ Ej(v), such that e 6 [c, d]. In this case 
Lij(v) may contain some points from the range Q only if all points p € L(v) with p.y 6 [c, d] belong 
to one group G s (v). Using E(v), we search for a point p s € L(t>) such that G [c, d]. If there is 
no such £>s, then L(v) n O, = 0. Otherwise p s € G s (v) and we can report all points in Q n G s (v) in 
0(1 + K/B) I/Os using H s (v). We need to visit 0(log B iV) nodes of the range tree to answer the 
query; hence, the total query cost is 0(log B N + K/B) I/Os. 

Since the lists Lij(v) are not stored, the space usage is reduced to 0((N/B) log 2 N): Each 
list Lij{v) contains less than B points and at most one pointer for each group G s (v). Since 
L(v) is divided into 0(\L(v)\/ B 1+2e ) groups, the total size of all Lij(v) is 0(\L(v)\). All data 
structures H s (v) for all groups G s (v) use 0((\L(v)\ log 2 B)/B) blocks of space. Each point belongs 
to 0{\og B N) nodes; therefore the total space usage is 0((N/B) log 2 N). 

When a new point p is inserted, we must insert it into 0(}og B N) lists L(v). Suppose that p is 
inserted into G s (v) in a node v. We insert p into H s (v) in 0(log 2 B) I/Os; p is also inserted into up 
to B 2e = 0(log 2 e N) lists Lij(v). The one-dimensional reporting data structure for L{j(v) supports 
updates in 0(log 2 N) I/Os; hence, the total cost of inserting a point is 0(log 2 £ N). For each pair 
i < ii we check whether the number of points in Lij(v) n G s (v) equals to IB. Although the list 
Lij{v) is not stored, we can estimate the number of points in Lij(v) n G s (v) by a query to the data 
structure H s [v). If \Lij(v) PI G s (v)\ = 2B, we remove all points of Lij(v) D G s (v) from Lij(v) and 
insert a pointer to G s (v) into Lij(v). Points in a list Lij(v) are replaced with a pointer to a group 
G s (v) at most once for a sequence of 0(B) insertions into G s (v). Hence, the amortized cost of 
updating L{j(v) because the number of points from Lij(v) in a group exceeds 2B is 0(log| N) I/Os. 
Each insertion affects 0(log 2 e N) lists Lij(v). If the number of points in G s (v) equals 2B log 2 +2e N, 
we split the group G s (v) into G\(v) and G 2 (i>) of _Blog 2 +2e N points each. Since up to B elements 
can be inserted and deleted into every list Lij(v), the amortized cost incurred by splitting a group 
is 0(log 2 e N). Thus the total cost of inserting a point into data structures associated with a node 
v is 0(log 2 e N) I/Os. Since a new point is inserted into 0(log 2 N/ log 2 log 2 N) nodes of the range 
tree, the total cost of an insertion is 0(log 2 +9e N/log 2 B) = 0(log^ e N). Deletions are processed 
in a symmetric way. 

Combining this result with Lemma U we obtain the following Theorem 

Theorem 3 Suppose that point coordinates are integers. There exists a data structure that uses 
0((N/B)log 2 N) blocks of space and answers orthogonal range reporting queries in two dimensions 
in 0(log B N + jj) I/O operations. Updates are supported in 0(log^ +e N) amortized I/Os w.h.p. 
for any e > 0. 
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Appendix A. Two-Dimensional Range Reporting for B = Q<(\.og\ N) 

We maintain a constant fan-out tree T on the set of x-coordinates of all points. An internal node of 
T has at most eight children. A point p belongs to an internal node v, if its x-coordinate is stored 
in a leaf descendant of v. We assume that the height of T is bounded by h\ log 2 N. Each node v 
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contains two secondary data structures that support three-sided queries of the form [a, +00) x [c, d] 
and (—00,6] x [c,d] respectively; both data structures contain all points that belong to v. We also 
store all points that belong to v in a B-tree sorted by their y-coordinates, so that all points with 
y-coordinates in an interval [c, d] can be reported. The data structures for three-sided queries are 
implemented as described in Lemma [2 We implement the B-tree using the result of [5] , so that 
updates are supported in 0(1/ B 5 ) I/Os. Hence, updates on the secondary data structures are 
supported in 0(1/ B s ) I/Os. 

We say that a node v of T is special if the depth of v is divisible by \5 log 2 B/3]. To facilitate 
the query processing, buffers with inserted and deleted elements will be stored in the special nodes 
only. A node u is a direct special descendant of v if u is a special node, u is a descendant of v, and 
there is no other special node u' on the path from v to u. We denote by desc(v) the set of direct 
special descendants of a node v. The set of nodes subset(u ) consists of the node v and all nodes 
w, such that w is a descendant of v and w is an ancestor of some node u G desc(v). In other words, 
every node w on a path from v to one of its direct special descendants belongs to subset (u); the 
node v also belongs to subset(-u). 

Let I(v) and D(v) denote the buffers of inserted and deleted points stored in a node v G T. 
When a point is inserted, we add it to the buffer I(vr), where vr is the root of T. When a 
buffer I(v) contains at least B 2S elements, we visit every node w G subset (v) and insert all points 
p G I(v) n rng(w) into the secondary data structures of a node uu. Then, we examine all nodes 
u G desc(v). For every u G desc(v), we insert all points p G I(v) (~1 rng(u) into I(u) and remove 
all points p G (I(v) fl rng(u)) fl D(u) from D(u). Finally, we set I(v) = 0. 

The total number of nodes in subset(u) and desc(w) is O(B^). Since each point is inserted 
into 0(log 2 -B) data structures and the total number of points is 0(B 2S ), all data structures in 
subset(f) can be updated in 0(B 2S log 2 B/B 5 ) = 0(B 5 \og 2 B) I/Os. We can also update the 
buffers I(u) and D(u) for each u G desc(v) in 0(1) I/Os. Hence, a buffer I(v) can be emptied in 
0(B S log 2 B) I/Os. Since a buffer I(v) is emptied once after &(B 2S ) points were inserted into I(v), 
the amortized cost of an insertion into I(v) is 0(log 2 B /B s ). An insertion of a point p into the data 
structure leads to insertions of p into 0(\og B N) buffers I(v). Hence, the amortized cost of inserting 
a point p is 0(log 2 N/B 6 ) = 0(1). Deletions can be processed with a simmetric procedure. 

Consider a query Q = [a, b] x [c, d}. We identify the node v of T such that [a, b] C rng(v), but 
[a, 6] <f_ rng(vi) for any child vi of v. Suppose that [a, b] intersects with rng(vi), . . ., rng(v r ) where 
1 < / < r < 4. All points p G S D Q are stored in the secondary structures of nodes vi, . . . , v r or in 
buffers of the special ancestors of v (possibly including the node v itself). We start by constructing 
sets INS(v) and DEL(v). The set INS(v) contains all points p such that p G I(w) for an ancestor 
u of v, but p D(u') for an ancestor u' of u. The set DEL(v) contains all points p such that 
p G D(w) for an ancestor u of v, but p I(u') for an ancestor v! of u. Only 0(log B AQ ancestors of 

v are special nodes and every buffer stored in a special node contains at most B 2S points. Hence, 
both INS(v) and DEL(v) can be constructed in 0(log B N) I/Os and contain h\-B 2& log B N < B/4 
points. We output all points of p G INS(v) fl Q in O(l) I/Os. Let V be the list of all points p, 
such that p belongs to Q and p is stored in a child of u . The list V can be generated as follows. 
First, we answer three-sided queries [a, +00) x [c, d] and (—00, b] x [c, d] on data structures for nodes 

vi and v r respectively. Then, we identify all points p stored in a node Vj, I < j < r, such that 
c < p-y < When the list V is constructed, we traverse V and output all points of V that do not 
belong to the set DEL. The list V can be generated and traversed in 0(log B N + ^) I/Os. Since 
the total number of points in the answer is K > \V\ — B/A, all points of V \ DEL can be identified 



10 



and reported in 0(\og B N + ^) I/Os. 

Our result is summed up in the following lemma 

Lemma 4 Suppose that B s > 41og 2 N for a constant 5 < 1/4. Then there exists a data structure 
that uses 0((N/B)log 2 N) blocks of space and answers two-dimensional orthogonal range reporting 
queries in 0(log B N + K/B) I/Os. The amortized cost of inserting or deleting a point is 0(1). 

We can slightly improve the space usage by increasing the fan-out of the base tree. Our construction 
is the same as above, but every internal node has ®(\og B N) children. We also store an additional 
data structure H{v) in every internal node v of T. For any I < r and any c < d, H(v) enables 
us to efficiently report all points p, such that p.y € [c, d] and p is also stored in a child Vj of v for 
I < j < r. The data structure H(v) is described below. 

Let L(v) denote the list of all points that belong to v. Let Y(v) be the set that contains y- 
coordinates of all points in L(v). For every point p € L(yi) and for all children v\ of v, H(v) contains 
a "point" r(p) = (p.y,succ(p.y,Y(vi))). For a query c, H(v) returns all points p G L(v) such that 
r(p) G (— oo,c] x [c, +oo). In other words, we can report all points p 6 L(v) such that p.y < c and 
succ(p.y,Y(vi) > c. An answer to query contains 0(log B iV) points; at most one point for each 
child vi. Using LemmaEl H(v) supports queries and updates in 0(\og B N + K) = 0(log B N) I/Os 
and updates in 0(1/B S ) amortized I/Os respectively. 

We can report all points p € L(vj) such that p.y £ [c, d] and I < j < r as follows. Using H(v), 
we search for all points p, such that p.y < c and succ(p.y, Y(v{) > c. For every found p, I < j < r, 
we traverse the list L(vj) and report all points that follow p until a point p', p'.y > d, is found. 
The total query cost is 0(log B ./V + K/B). 

The global data structure supports insertions and deletions of points in the same way as shown 
in Lemma [U The query answering procedure is also very similar to the procedure in the proof of 
Lemma SI We identify the node v of T such that [a, b] C rng(v), but [a, b] <f_ rng(vi) for any child 
Vi of v. We also find the children vi,...,v r of v such that [a, b] intersects with rng(vi), . . ., rng(v r ). 
The sets INS(v), DEL(v), and the list V can be generated as described above. The only difference 
is that we identify all points p stored in nodes Vj, I < j < r, such that c < p.y < d using the data 
structure H(v). 
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